Skip to main content

Schema evolution and type casting

Overview

DataStori app will attempt to cast incoming data columns to match the existing table schema. If casting fails, it will raise a ValueError with detailed error information.

Current State Analysis

The current implementation includes:

  • Complex validation logic with regex patterns for String→Numeric conversions
  • Precision loss warnings for Double/Float→Integer conversions
  • Special handling for NullType columns
  • Multiple validation cases before casting

Type Conversion Matrix

| Source Type | Target Type | Conversion Result | Error Condition |

|------------|-------------|------------------|-----------------|

| NullType | Any Type | Schema evolution (no cast) | Never errors |

| StringType | IntegerType | Parses string to integer | Invalid numeric strings |

| StringType | LongType | Parses string to long | Invalid numeric strings |

| StringType | DoubleType | Parses string to double | Invalid numeric/decimal strings |

| StringType | FloatType | Parses string to float | Invalid numeric/decimal strings |

| StringType | DecimalType | Parses string to decimal | Invalid numeric/decimal strings |

| StringType | DateType | Parses date string | Invalid date format |

| StringType | TimestampType | Parses timestamp string | Invalid timestamp format |

| StringType | BooleanType | Parses boolean string | Invalid boolean string |

| IntegerType | LongType | Widening (safe) | Never errors |

| IntegerType | DoubleType | Widening (safe) | Never errors |

| IntegerType | FloatType | Widening (safe) | Never errors |

| IntegerType | DecimalType | Conversion (safe) | Never errors |

| IntegerType | StringType | String representation | Never errors |

| LongType | IntegerType | Narrowing (may overflow) | Value exceeds Integer range |

| LongType | DoubleType | Widening (safe) | Never errors |

| LongType | FloatType | Widening (safe) | Never errors |

| LongType | DecimalType | Conversion (safe) | Never errors |

| LongType | StringType | String representation | Never errors |

| DoubleType | IntegerType | Truncation (precision loss) | Never errors (truncates) |

| DoubleType | LongType | Truncation (precision loss) | Never errors (truncates) |

| DoubleType | FloatType | Narrowing (precision loss) | Never errors (may lose precision) |

| DoubleType | DecimalType | Conversion | Never errors |

| DoubleType | StringType | String representation | Never errors |

| FloatType | IntegerType | Truncation (precision loss) | Never errors (truncates) |

| FloatType | LongType | Truncation (precision loss) | Never errors (truncates) |

| FloatType | DoubleType | Widening (safe) | Never errors |

| FloatType | DecimalType | Conversion | Never errors |

| FloatType | StringType | String representation | Never errors |

| DecimalType | IntegerType | Truncation (precision loss) | Never errors (truncates) |

| DecimalType | LongType | Truncation (precision loss) | Never errors (truncates) |

| DecimalType | DoubleType | Conversion | Never errors |

| DecimalType | FloatType | Conversion | Never errors |

| DecimalType | StringType | String representation | Never errors |

| DateType | TimestampType | Adds time component (00:00:00) | Never errors |

| DateType | StringType | String representation | Never errors |

| TimestampType | DateType | Removes time component | Never errors |

| TimestampType | StringType | String representation | Never errors |

| BooleanType | StringType | String representation | Never errors |

| BooleanType | IntegerType | true→1, false→0 | Never errors |

| Any Type | NullType | Not supported | Always errors |

3. Strategy-Specific Behavior

Full Refresh (OVERWRITE)

  • Behavior: Existing table is completely replaced
  • Casting: Incoming data is cast to match existing schema before overwrite
  • Schema Evolution: Uses overwriteSchema=true, so schema can change
  • Impact: If casting fails, entire operation fails before write

Full Refresh Append

  • Behavior: Incoming data is appended to existing table
  • Casting: Incoming data is cast to match existing schema before append
  • Schema Evolution: Uses mergeSchema=true for new columns
  • Impact: If casting fails, entire operation fails before append
  • Note: Existing rows are preserved; only incoming rows are cast

Incremental Dedupe History (UPSERT)

  • Behavior: Updates matching rows, inserts new rows
  • Casting: Incoming data is cast to match existing schema before merge
  • Schema Evolution: Uses mergeSchema=true for new columns
  • Impact: If casting fails, entire operation fails before merge
  • Note: Existing rows keep their types; only incoming rows are cast

Incremental Drop and Load

  • Behavior: Deletes matching rows, then appends incoming data
  • Casting: Incoming data is cast to match existing schema before append
  • Schema Evolution: Uses mergeSchema=true for new columns
  • Impact: If casting fails, entire operation fails before delete/append
  • Note: Matching existing rows are deleted; incoming rows are cast and appended

Error Message Format

ValueError: Failed to cast column '{column_name}' from {source_type} to {target_type}. 
Error: {original_error_message}